Principal Differences Analysis: Interpretable Characterization of Differences between Distributions

نویسندگان

  • Jonas Mueller
  • Tommi S. Jaakkola
چکیده

We introduce principal differences analysis (PDA) for analyzing differences between high-dimensional distributions. The method operates by finding the projection that maximizes the Wasserstein divergence between the resulting univariate populations. Relying on the Cramer-Wold device, it requires no assumptions about the form of the underlying distributions, nor the nature of their inter-class differences. A sparse variant of the method is introduced to identify features responsible for the differences. We provide algorithms for both the original minimax formulation as well as its semidefinite relaxation. In addition to deriving some convergence results, we illustrate how the approach may be applied to identify differences between cell populations in the somatosensory cortex and hippocampus as manifested by single cell RNA-seq. Our broader framework extends beyond the specific choice of Wasserstein divergence.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

بررسی تنوع ژنتیکی پیازهای بومی ایران

In order to study the genetic variation among local varieties of onion in Iran, an experiment was conducted in the Research Center, Faculty of Agriculture, Tabriz University. Sixteen populations were evaluated for agronomic characteristics and also total seed proteins via SDS-PAGE. Cluster analysis and principal component analysis were used to group the onion populations under study. Analysis o...

متن کامل

بررسی تنوع ژنتیکی پیازهای بومی ایران

In order to study the genetic variation among local varieties of onion in Iran, an experiment was conducted in the Research Center, Faculty of Agriculture, Tabriz University. Sixteen populations were evaluated for agronomic characteristics and also total seed proteins via SDS-PAGE. Cluster analysis and principal component analysis were used to group the onion populations under study. Analysis o...

متن کامل

Internal Traits of Eggs and Their Relationship to Shank Feathering in Chicken Using Principal Component Analysis

Chicken eggs represent an important source of protein to the growing human population and also supply repositories of unique genes that could be used worldwide. The inheritance of shank feathering trait is dominant upon non-feathering shank trait in chicken which is based on two factors: pti-1L and pti-1B that are located on Chromosomes 13, 15, and 24. Using 185 fertile eggs collected from two ...

متن کامل

Application of multivariate techniques in-line with spatial regionalization of AOD over Iran

Application of multivariate techniques in-line with spatial regionalization of AOD over Iran Introduction Models, satellites and terrestrial datasets have been used to detect and characterize aerosol. Nontheless, micoscale classification using remote sensing parameters considers as a deficiency. Thus, regionalizion and modeling aerosol without regard to political boundaries or a specific s...

متن کامل

Parallel Factor Analysis of gait waveform data: A multimode extension of Principal Component Analysis.

Gait data are typically collected in multivariate form, so some multivariate analysis is often used to understand interrelationships between observed data. Principal Component Analysis (PCA), a data reduction technique for correlated multivariate data, has been widely applied by gait analysts to investigate patterns of association in gait waveform data (e.g., interrelationships between joint an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015